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ABSTRACT 

Consider  the  linear  regression  model  Y  *  X9  +  e  where  Y 
denotes  a  vector  of  n  observations  on  the  dependent  variable , 

X  is  a  known  matrix,  9  is  a  vector  of  parameters  to  be  estimated 
and  e  is  a  random  vector  of  uncorrelated  errors.  If  X’X  is 
nearly  singular,  that  is  if  the  smallest  characteristic  root  of 
X'X  is  small  then  a  small  perturbation  inthe  elements  of  X,  such 
as  due  to  measurement  errors, induces  considerable  variation  in 
the  least  squares  estimate  of  9.  In  this  paper  we  examine  for 
the  asymptotic  case  when  n  is  large  the  effect  of  perturbation 
with  regard  to  the  bias  and  mean  squared  error  of  the  estimate. 

Key  words:  Linear  regression;  least  squares  estimate; 
mean  squared  error. 
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1.  Introduction.  Consider  the  linear  regression  model 


(1.1) 


Y  -  X9  +  £ 


where  Y  is  a  nxl  vector  of  observations,  X  is  a  fixed  nxp  matrix 

of  rank  p,  9  is  a  pxl  vector  of  unknown  parameters  to  be  estimated 

and  e  is  a  nxl  vector  of  random  errors.  Let  the  components  of  e 

be  uncorrelated  and  identically  distributed  with  mean  zero  and 
2 

variance  a  ,  say.  Let  X .,..., X  denote  the  characteristic  roots 

1  p 

of  X'X,  where  prime  denotes  the  transpose  of  a  matrix.  The  least 
squares  estimate  of  9  and  the  sum  of  mean  squared  errors  (SMSE)  of 
the  components  of  9  are  given  by 


(1.2) 

(1.3) 


9  * 


(X'X) _1X ' Y 


SMSE  9  *  E (9-9) '  (9-9) 


2  TP  i -1 

o  l  X.  . 

l«l 


Clearly,  9  is  an  unbiased  estimator  of  9.  From  (1.3)  it  is 
seen  that  if  X'X  is  nearly  singular,  that  is  if  one  or  more  of 
the  values  of  is  small  then  9  is  unstable  in  the  sense  that 

A 

the  variance  of  some  of  the  components  of  9  is  large.  A  small 
value  of  xi  may  arise  from  certain  interrelationship  between  the 
independent  variables  of  the  linear  model.  The  relation  is 
called  multicollinearity  in  econometrics. 

Suppose  that  the  elements  of  X  are  subjected  to  small  random 
perturbations,  such  as  due  to  measurement  errors.  From  (1.2)  it 

A 

is  clear  that  the  least  square  estimator  9  is  no  more  unbiased 


Sect  to*  | i 
ctioo 


HTBBB 

Otsl  A^AIL.  wd/or  mm 


□  □ 
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for  9.  Beaton,  Rubin  and  Barone  (1976)  considered  a  set  of 
data  proposed  by  Longley  (1967)  for  regression  analysis  to 
find  the  effect  of  perturbation.  They  introduced  perturbation  as 
round-off  errors  in  the  numerical  values  of  the  elements  of  X. 

Prom  an  extensive  empirical  study  they  found  that  the  regression 
analysis  could  be  very  sensitive  to  small  perturbations.  The 
authors  have  concluded  from  their  study  that  "the  computer  program 
is  often  not  the  most  important  factor  in  computing  regression 
analysis,  and  that  the  best  thing  a  program  can  do  for  some  problems 
is  to  refuse  to  complete  the  calculations" .  The  conclusion  seems 
to  be  naive  (see  Dent  and  Cavendar  (1977)  and  Espasa  (1977)  for 
comments  on  the  authors'  paper).  The  problem  arises  from  the  choice 
of  the  estimator,  namely,  the  least  squares  estimator  which  is 
unstable  when  the  design  matrix  X’X  is  nearly  singula'r.  The 
difficulty  can  be  overcome  by  choosing  some  other  estimator,  such 
as,  the  "ridge"  estimator,  given  by  <5  «  (X'X+KI)  _1X ' Y ,  where  K 
is  a  positive  number.  But  then  <5  is  not  unbiased. 

In  this  paper  we  examine  the  behavior  of  the  least  squares 
estimator  when  n  is  large  and  X  is  subjected  to  a  random  pertur¬ 
bation.  Formulas  are  given  for  the  asymptotic  bias  and  variance. 

The  relation  between  the  bias  and  the  eigen  values  of  X'X  is  shown 
through  a  canonical  representation  of  the  parameter  9 .  it  is  seen 
that  the  smaller  the  eigen  value,  the  larger  is  the  associated  bias. 
The  given  formulas  are  checked  with  an  empirical  result  obtained 
by  the  Monte  Carlo  method. 

In  a  recent  paper,  Stewart  (1977)  has  given  an  upper  bound  on 
the  deviation  of  the  least  squares  estimator  due  to  a  given  per¬ 
turbation  in  X.  But  Stewart's  method  is  not  applicable  to  the 
derivation  of  the  results  given  in  this  paper. 


2.  Main  results.  Let  F  denote  the  perturbation  matrix.  That 
is,  X+F  represents  the  perturbed  matrix  of  the  independent 
variables  of  the  linear  model  (1.1).  Suppose  that  the  elements 
of  F  are  uncorrelated  random  variables,  distributed  independent 
of  e  with  mean  zero  and  common  variance  v,  say.  The  least  squares 
estimator  of  0  for  the  perturbed  data  set  is  given  by 

(2.1)  9*  -  ( (X+F) ' (X+F) )_1(X+F) 'Y 
Therefore 

(2.2)  E0*  »E( (X+F) ' (X+F) )'1(X+F)  *X0 

«  0  -E((X+F) ' (X+F) )-1(X+F) 'F8 

where  the  expectation  in  the  second  line  on  the  right  side  of 

(2.2)  is  with  respect  to  the  distribution  of  the  perturbation 
errors.  Formula  (2.2)  gives  the  bias  of  6*. 

Let  the  rows  of  the  matrix  X  be  extended  such  that  the  elements  of  X 
are  uniformly  bounded  and  the  characteristic  roots  of  X'X  are  given  by 
\.  ■  n  v.  +  0(n  **)  ,  where  v v  are  a  fixed  set  of  positive 

i  j.  i  p 

numbers.  Let  a  *  P9  and  a*  =  P9*,  where  P  is  an  orthogonal  matrix 
diagonalizing  X'X.  Multiplying  both  sides  of  (2.2)  by  P  and  equat¬ 
ing  the  ith  component  of  the  resulting  vector  of  each  side  we  have 
after  simplification 

(2.3)  E  a*  -  a.  -  (^-  +  Ofn"'*))^. 

* 

Similarly ,  the  variance  and  mean  squared  error  of  ou  are  given  by 

(2.4)  n  var  a*  -  U  ♦  0(n“%)) 
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(2.5) 


2  2 

*  ,  _ 2  .  nv  af 

n  E  (a  .  -a . )  ^  (1  +  0(n"*))  +  - 

1  1  i  (v.+v) 


■j—  (1  +  0  (n  ) ) 


Therefore 

(2.6)  n  SMSE  9*  =  n  E ( 9 * — 9 )  1  ( 9 *  —  0 ) 

*  n  E (a*-a) ' (a*-a) 


2  2 

p  „2  p  nv  at  . 

tl  .  v?w+  I.  .  7—T?1'1  + 

i=l  X  1»1  (v.+v) 


* 

If  v  *  0/  that  is,  if  there  is  no  perturbation  then  E  *  a^. 

* 

From  (2.3)  it  is  seen  that  the  relative  bias  of  a.  is  small  if  v  is 

small  compared  to  v^,  as  it  should  be  expected.  On  the  other  hand, 

* 

if  vi  is  small  compared  to  v  then  the  relative  bias  of  a.  is  nearly 
equal  to  -1. 

•  o2  o2 

From  (2.4)  it  is  seen  that  for  v  =  0  we  have  var  a.  =  -r—  ■  7 — 

x  nv.  x. 

which  agrees  with  the  result  given  in  (1.3).  To  see  the  relation 

* 

between  the  effect  of  perturbation  on  the  variance  of  ai  and 
the  associated  eigen  value  of  X'X,  we  write  (2.4)  as  follows: 

2 

(2.7)  n  v  var  a-  *  ^  +  +  Ofn”**) . 

X  Vj^TV 

From  (2.7)  it  is  seen  that  the  perturbation  of  X  has  a  stabilizing 
influence  on  the  least  square  estimate.  But  the  reduction  in  the 
variance  should  be  reckoned  with  the  induced  bias. 

To  verify  the  asymptotic  formulas  given  above,  we  have  carried 
out  the  regression  analysis  under  perturbation  with  a  16x6  matrix  X, 
obtained  from  the  data  proposed  by  Longley  (1967) .  However,  the 
matrix  was  modified  for  certain  changes  in  scale  and  origin.  The 
characteristic  roots  of  the  modified  matrix  are  given  by  *  16 v^ 


where 
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v L  -  .2188(10>“2,  v 2  -  . 3705 {10) -1  /  v3  -  .2005(10)_1 
v 4  =  . 1118 (10) 2  .  v 5  =  .1282  (10)  3  ,  vg  -  . 3596 (10) 4 . 

From  the  given  values  of  we  generate  as  follows  an  n*p  matrix 

Z  for  large  n  such  that  the  characteristic  roots  of  Z'Z  are  approxi 

k 

mately  given  by  X^  *  nv^  +  0(n*):  Generate  a  p-component 
vector  U  whose  components  are  identically  and  independently  dis¬ 
tributed  as  N ( 0 , 1 ) .  Compute 

T  =  P'  /DU 

where  P  is  the  orthogonal  matrix  diagonalizing  X'X  and  D  denotes 
the  diagonal  matrix  with  diagonal  elements  equal  to  v^, 
i  =  1,...,6.  Generate  n  independent  values  of  T  and  set  them 
equal  to  the  columns  of  Z ' . 

For  each  Z  we  generate  the  error  vector  e  whose  components 
are  independently  and  identically  distributed  as  N(0,1),  that  is, 
o=l.  Then  we  compute  Y  from  the  formula  Y  =  Z0  +  e ,  where  the 
components  of  0  are  given  by 

=  .0151,  e2  -  -.3582,  83  =  -.2020 

04  =  -.1033,  e5  =  -.5110,  0 g  =  .1829. 

The  value  of  0  given  above  is  the  least  square  estimate  of  0 
computed  from  the  data  given  by  Longley.  For  the  discussion  of 
this  paper  any  other  value  of  0  could  have  been  assumed  as  well. 

The  matrix  Z  is  perturbed  by  adding  to  each  element  of  Z 
independent  values  of  a  random  variable  £,  distributed  uniformly 
on  (~j,  |) ,  giving  v  «  Jy. 

The  results  of  the  regression  analysis  are  shown  in  Table  I 
below.  The  figures  given  in  the  table  for  the  asymptotic  bias  and 
mean  squared  error  of  the  least  squares  estimate  are  obtained  from 
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the  formulas  (2.3)  and  (2.5).  The  figures  for  the  empirical 
values  given  in  the  table  are  each  based  on  500  simulations. 
They  were  found  to  be  fairly  accurate,  by  checking  duplicate 
values.  It  is  seen  from  the  table  that  there  is  fair  agreement 
between  the  theoretical  and  empirical  figures. 
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1 

I 


Table  I  -  Asymptotic  (Asym)  and  Empirical  (Emp)  values  of 

*  *  1 
E  ai  ”  ai  and  nMSE(ai)  for  v  “  X7  and  n  *  500 . 


1 

* 

Ea .  -  a • 

i  l 

nMSE  (a*) 

Asym 

Emp 

Asym 

Erop 

i-1 

-.3713 

-.3914 

80.6393 

76.7583 

2 

.3028 

.4482 

54.1350 

50.6663 

3 

.0027 

.0596 

.4825 

1 . 9906 

4 

.0004 

.0353 

.0889 

.7266 

5 

-.0001 

-.0253 

.0078 

.3884 

6 

-.0025 

.0003 

.0047 

Denotes  insignificant  figure 
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